Mining Billion-Scale Graphs: Patterns and Algorithms
نویسندگان
چکیده
Graphs are everywhere: social networks, the World Wide Web, biological networks, and many more. The sizes of graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the patterns in large graphs, spanning Giga, Tera, and heading toward Peta bytes? What are the best tools, and how can they help us solve graph mining problems? How do we scale up algorithms for handling graphs with billions of nodes and edges? These are exactly the goals of this tutorial. We start with the patterns in real-world static, weighted, and dynamic graphs. Then we describe important tools for large graph mining, including singular value decomposition, and Hadoop. Finally, we conclude with the design and the implementation of scalable graph mining algorithms on Hadoop. This tutorial is complementary to the related tutorial ”Managing and Mining Large Graphs: Systems and Implementations”.
منابع مشابه
Net-Ray: Visualizing and Mining Billion-Scale Graphs
How can we visualize billion-scale graphs? How to spot outliers in such graphs quickly? Visualizing graphs is the most direct way of understanding them; however, billion-scale graphs are very difficult to visualize since the amount of information overflows the resolution of a typical screen. In this paper we propose NET-RAY, an open-source package for visualizationbased mining on billion-scale ...
متن کاملWOOster: A Map-Reduce based Platform for Graph Mining
Large scale graphs containing O(billion) of vertices are becoming increasingly common in various applications. With graphs of such proportion, efficient querying infrastructure becomes crucial. In this paper, we propose WOOster a hosted querying infrastructure designed specifically for the large graphs. We make two key contributions: a) Design of the WOOster framework. b)Scalable map-reduce alg...
متن کاملMining Tera-Scale Graphs: Theory, Engineering and Discoveries
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such Teraor Peta-scale graphs? In this thesis, we propose PEGASUS, a large scale graph mining system implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. PEGASUS includes algorithms which help us spot patterns and anomalous beh...
متن کاملResearch Statement - Tera-Scale Graph Analysis
My vision is to design and implement big data analytics system which finds useful patterns and anomalies in graphs. Graphs are ubiquitous: computer networks, social networks, mobile call networks, protein regulation networks, and the World Wide Web, to name a few. The large volume of available data, the low cost of storage and the stunning success of online social networks and Web2.0 applicatio...
متن کاملCoreScope: Graph Mining Using k-Core Analysis - Patterns, Anomalies and Algorithms
How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we use them for algorithm design and applications? A k-core is the maximal subgraph where all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore perva...
متن کامل